-
Notifications
You must be signed in to change notification settings - Fork 455
release/v0.61.1 #2895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release/v0.61.1 #2895
Conversation
|
GitHub CI seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR bumps the version from 0.60.2 to 0.61.1 and introduces significant enhancements to the evaluation system, particularly around custom evaluations, evaluator revisions, and metric handling. The changes span across the web frontend (both OSS and EE editions) and the Python SDK.
Key changes include:
- Introduction of "custom" evaluation type throughout the codebase alongside existing "auto", "human", and "online" types
- Implementation of evaluator revision fetching and merging logic to support version-based evaluator definitions
- Enhanced metric column factory with improved slug resolution, nested metric support, and better type inference from statistics
- SDK workflow improvements making slug parameters optional for built-in evaluators
- New SDK model structures for evaluations, testsets, and git-based artifact management
- Improved metric key normalization and fallback resolution in the focus drawer
- CSV export functionality for custom evaluations
Reviewed Changes
Copilot reviewed 113 out of 118 changed files in this pull request and generated 55 comments.
Show a summary per file
| File | Description |
|---|---|
| web/package.json | Version bump to 0.61.1 |
| web/oss/package.json | Version bump to 0.61.1 |
| web/ee/package.json | Version bump to 0.61.1 |
| web/oss/src/state/evaluators/atoms.ts | Added evaluator revision fetching and merging logic with new utility functions |
| web/oss/src/state/app/hooks.ts | Enhanced app filtering logic to exclude SDK evaluation apps |
| web/oss/src/state/app/atoms/fetcher.ts | Added filtering and new app detail query atom |
| web/oss/src/lib/hooks/useEvaluators/types.ts | Added EvaluatorRevisionDto types |
| web/oss/src/lib/hooks/useEvaluators/index.ts | Removed unused rest parameter |
| web/oss/src/lib/Types.ts | Fixed semicolon formatting |
| web/ee/src/lib/metricColumnFactory.tsx | Major refactor with nested metric support, improved slug resolution, and type inference |
| web/ee/src/components/pages/evaluations/* | Added "custom" evaluation type support throughout |
| web/ee/src/components/HumanEvaluations/assets/utils.tsx | Enhanced metric collection and evaluator slug resolution |
| web/ee/src/components/EvalRunDetails/* | Updated to support custom evaluation type |
| sdk/pyproject.toml | Version bump and dependency updates |
| sdk/agenta/sdk/workflows/utils.py | Renamed PARAMETERS_REGISTRY to CONFIGURATION_REGISTRY |
| sdk/agenta/sdk/workflows/builtin.py | Made slug parameter optional for all built-in workflows |
| sdk/agenta/sdk/utils/references.py | New utility for slug generation |
| sdk/agenta/sdk/utils/client.py | New authenticated API client utility |
| sdk/agenta/sdk/models/* | New model files for workflows, evaluations, testsets, git, and blobs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
release/v0.61.1